Japanese Word Segmentation Using Similarity Measure for IR

نویسندگان

  • Tomohiro Ozawa
  • Mikio Yamamoto
  • Kyoji Umemura
  • Kenneth Ward Church
چکیده

*U SFNBJOT BO PQFO RVFTUJPO XIBU BSF UIF CFTU VOJUT GPS *3 JO QBSUJDVMBS GPS "TJBO MBOHVBHFT XPSET QISBTFT CJHSBNT PS OHSBNT 0VS QSPQPTBM JT UIBU UIF CFTU VOJUT BSF XIBU NBYJNJ[F B TJNJMBSJUZ NFBTVSF CFUXFFO B RVFSZ BOE B EPDVNFOU 5IBU JT JO UIJT GSBNFXPSL UIF *3 TZTUFN TIPVME IBWF EJGGFSFOU SFQSFTFOUBUJPOT PG B RVFSZ GPS FBDI EPDVNFOU 8F EFWFMPQ UIF NFUIPE XIJDI TFHNFOUT B RVFSZ JOUP VOJHSBNT CJHSBNT BOE BSCJUSBSZ MFOHUI OHSBNT VTJOH B TJNJMBSJUZ NFBTVSF TVDI BT UG JEG BT UIF DSJUFSJB GPS UIF TFHNFOUBUJPO &YQFSJNFOUBM SFTVMUT TIPX UIBU UIF NFUIPE UBLFT BEWBOUBHF PG UFDIOJDBM UFSNT XIJDI UFOE UP CF MPOHFS UIBO CJHSBNT BOE JOUFHSBUFT UIF BEWBOUBHFT PG UIF XPSE CBTFE NFUIPE BOE UIF OHSBN CBTFE NFUIPE XJUIPVU ESBXCBDLT PG CPUI

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Word Segmentation Accuracy and Its Effects on Information Retrieval

In Chinese information retrieval (IR), word segmentation is an essential prerequisite process to break down the documents into smaller linguistic units or word segments so that they can be indexed for subsequent retrieval. Despite a host of Chinese information systems that are in existence today, little work has been done to study word segmentation accuracy and its effect on IR. This article de...

متن کامل

A Japanese-to-English Statistical Machine Translation System for Technical Documents

This thesis addresses a Japanese-to-English statistical machine translation (SMT) system for technical documents. Machine translation (MT) is a promising solution for growing translation needs. Japanese-to-English MT is one of the most difficult language pairs due to their large lexical and syntactic differences. This thesis work focuses on patents as the most demanded technical documents that ...

متن کامل

Berkeley at NTCIR-2: Chinese, Japanese, and English IR experiments

This paper reports on the work of Berkeley group at the second NTCIR workshop on Japanese & English IR and Chinese IR. A number of runs were submitted on all subtasks in the two main tasks. Our main focus on the Japanese monolingual subtask was on comparing the retrieval effectiveness of different segmentation methods. The experimental results show the bigram indexing outperformed the word-base...

متن کامل

Cohesion and Collocation: Using Context Vectors in Text Segmentation

Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VecTile system, produces similarity curves over texts using pre-compiled vector representations of the...

متن کامل

Strategies of Processing Japanese Names and Character Variants in Traditional Chinese Text

This paper proposes an approach to identify word candidates that are not Traditional Chinese, including Japanese names (written in Japanese Kanji or Traditional Chinese characters) and word variants, when doing word segmentation on Traditional Chinese text. When handling personal names, a probability model concerning formats of names is introduced. We also propose a method to map Japanese Kanji...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999